Executive Summary
This analysis models unemployment rates across seven education levels using a quasi-binomial generalized additive model (GAM) fit to 25 years (2000-2025) of monthly Current Population Survey data. By analyzing all education levels in a single model, we can:
- Quantify PhD unemployment premium relative to other degrees
- Measure how economic cycles affect different education groups differently
- Identify seasonal patterns in labor market dynamics
- Account for overdispersion in unemployment count data (dispersion = 14.76)
Key Finding
PhD unemployment averages 1.7% over 25 years but has risen to 2.6% recently. Using quasi-binomial models reveals substantial overdispersion (14.76×), demonstrating that standard binomial assumptions severely underestimate uncertainty.
Data & Methods
- Time period: 2000 to 2025
- Total observations: 2156
# A tibble: 7 × 6
education n_months mean_unemp_rate max_unemp_rate min_unemp_rate sd_unemp_rate
<chr> <int> <dbl> <dbl> <dbl> <dbl>
1 less_tha… 308 0.0767 0.222 0 0.0411
2 high_sch… 308 0.0653 0.174 0.0391 0.0224
3 some_col… 308 0.0549 0.173 0.0286 0.0206
4 bachelors 308 0.0316 0.0938 0.0158 0.0114
5 masters 308 0.0253 0.0634 0.00975 0.00827
6 phd 308 0.0168 0.0388 0.00351 0.00591
7 professi… 308 0.0164 0.0678 0.00327 0.00711
Model Specification
We fit a quasi-binomial GAM with the formula:
\[\text{cbind}(n_{unemployed}, n_{employed}) \sim \text{education} + \text{shock\_2008\_2009} + \text{shock\_2020} +\] \[s(\text{time\_index}, k=150, \text{by=education}, \text{bs="tp"}) +\] \[s(\text{time\_index}, k=20, \text{by=shock\_2008\_2009}, \text{bs="tp"}) +\] \[s(\text{time\_index}, k=20, \text{by=shock\_2020}, \text{bs="tp"}) +\] \[s(\text{month}, k=12, \text{bs="cc"}) + s(\text{month}, k=12, \text{bs="cc"}, \text{by=education})\]
Model components: - education: Main effect for each education level (intercept differences) - shock_2008_2009, shock_2020: Binary indicators for major economic disruptions including precursor + crisis + recovery: - 2007-2010 (48 months): Financial crisis precursor, crisis, and recovery - 2019-2021 (36 months): Pandemic precursor, crisis, and recovery - s(time_index, by=education): Education-specific smooth trends (k=150, thin plate splines) capturing long-term unemployment dynamics - **s(time_index, by=shock_*): Shock-specific time dynamics (k=20, thin plate splines) allowing different unemployment trajectories during extended crisis periods - s(month, bs=“cc”): Shared cyclic cubic spline for seasonal patterns (k=12) applied to all education levels - s(month, bs=“cc”, by=education): Education-specific seasonal deviations (k=12) from the shared seasonal pattern - Family: Quasi-binomial with automatic dispersion estimation - Method**: fREML with bam() optimization for large datasets
Model Fitting & Diagnostics
=== QUASI-BINOMIAL MODEL SUMMARY ===
Deviance explained: 98.4 %
Dispersion parameter: 1.86
Dispersion interpretation:
- Value > 1 indicates OVERDISPERSION (expected for count data)
- This value ( 1.86 ) means quasi-binomial is
critical: binomial SEs would be 1.4 × too small!
=== SMOOTHING COMPONENTS ===
Family: quasibinomial
Link function: logit
Formula:
cbind(n_unemployed, n_employed) ~ education + shock_2008_2009 +
shock_2020 + s(time_index, k = time_k, by = education, bs = "tp") +
s(time_index, k = 20, by = shock_2008_2009, bs = "tp") +
s(time_index, k = 20, by = shock_2020, bs = "tp") + s(month,
k = 12, bs = "cc") + s(month, k = 12, bs = "cc", by = education)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.529596 0.011500 -306.93 <2e-16 ***
educationhigh_school 0.763200 0.004766 160.15 <2e-16 ***
educationless_than_hs 0.916518 0.030964 29.60 <2e-16 ***
educationmasters -0.216791 0.008061 -26.89 <2e-16 ***
educationphd -0.626280 0.019072 -32.84 <2e-16 ***
educationprofessional -0.661513 0.019932 -33.19 <2e-16 ***
educationsome_college 0.567907 0.005350 106.15 <2e-16 ***
shock_2008_2009 0.000000 0.000000 NaN NaN
shock_2020 0.000000 0.000000 NaN NaN
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(time_index):educationbachelors 9.323e+01 111.868 35.880 < 2e-16 ***
s(time_index):educationhigh_school 1.225e+02 137.467 87.003 < 2e-16 ***
s(time_index):educationless_than_hs 9.638e+00 12.022 13.782 < 2e-16 ***
s(time_index):educationmasters 1.712e+01 21.430 40.681 < 2e-16 ***
s(time_index):educationphd 9.767e+00 12.144 10.761 < 2e-16 ***
s(time_index):educationprofessional 1.032e+01 12.854 10.296 < 2e-16 ***
s(time_index):educationsome_college 1.075e+02 125.499 61.376 < 2e-16 ***
s(time_index):shock_2008_2009 5.001e+00 5.342 6.429 6.42e-06 ***
s(time_index):shock_2020 5.531e+00 5.789 43.662 < 2e-16 ***
s(month) 8.047e+00 10.000 8.480 < 2e-16 ***
s(month):educationbachelors 2.112e+00 10.000 0.362 0.00304 **
s(month):educationhigh_school 4.105e+00 10.000 0.990 1.83e-05 ***
s(month):educationless_than_hs 3.103e+00 10.000 3.485 < 2e-16 ***
s(month):educationmasters 5.973e+00 10.000 4.879 < 2e-16 ***
s(month):educationphd 2.761e-04 10.000 0.000 0.75516
s(month):educationprofessional 1.376e-04 10.000 0.000 0.48855
s(month):educationsome_college 7.525e+00 10.000 4.248 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Rank: 1170/1172
R-sq.(adj) = 0.98 Deviance explained = 98.4%
fREML = 4582.2 Scale est. = 1.8579 n = 2156
Model Quality & Basis Dimension
The quasi-binomial GAM uses thin plate splines with k=150 basis functions for the main time trend to capture complex unemployment dynamics across the full 25-year period. This flexible basis was chosen to:
- Capture long-term labor market structural changes
- Allow education-specific trajectories (via
by=education)
- Model shock × time dynamics during economic crises (k=20)
- Maintain parsimonious seasonal effects (k=12)
=== MODEL QUALITY ASSESSMENT (k=150 thin plate splines) ===
Deviance explained: 98.4 %
Dispersion parameter: 1.86
Convergence achieved: TRUE
Model Performance Interpretation:
- Deviance explained > 98% indicates excellent model fit
- Dispersion = 1.86 shows appropriate handling of overdispersion
- Convergence confirms optimal parameter estimates
Model Diagnostics Plots
These plots show: - Top-left: Trend smooth over time (education adjusted) - Top-right: Seasonal pattern (education adjusted) - Bottom: Residual diagnostics
Education-Specific Unemployment Estimates
Current Unemployment Rates (December 2025)
Current Unemployment Estimates (Dec 2025)
| 3 |
less_than_hs |
8.31% |
0.0165143 |
5.07% |
11.54% |
| 2 |
high_school |
4.95% |
0.0029821 |
4.37% |
5.54% |
| 7 |
some_college |
4.02% |
0.0031653 |
3.4% |
4.64% |
| 1 |
bachelors |
2.75% |
0.0017833 |
2.4% |
3.1% |
| 4 |
masters |
2.38% |
0.0012781 |
2.13% |
2.63% |
| 5 |
phd |
1.72% |
0.0017973 |
1.37% |
2.07% |
| 6 |
professional |
1.46% |
0.0019982 |
1.07% |
1.85% |
Unemployment Trend by Education Level
Comparative Analysis: PhD vs Other Degrees
PhD vs All Other Education Levels
Economic Downturn Response
Seasonal Patterns
Monthly Seasonal Effects
Observation: The seasonal pattern is shared across all education levels - unemployment typically rises in winter months and falls in summer, reflecting academic and hiring cycles.
Overall Shared Seasonal Pattern
The model uses a two-component seasonal decomposition: 1. Shared pattern (applied to all education levels equally) 2. Education-specific deviations (allowing groups with strong seasonality to deviate from the shared pattern)
This section shows the overall shared seasonal component, which provides a stable baseline by pooling information across all education groups.
Education-Specific Seasonal Deviations
This section shows how each education group deviates from the overall shared seasonal pattern. Groups with weak seasonality (e.g., PhD) are heavily penalized by REML smoothing and deviate less from the shared pattern, while groups with strong education-specific seasonality (e.g., High School, Less than HS) show more prominent deviations.
Interpretation: - Groups near the zero line (PhD, Masters) have minimal education-specific seasonality and follow the shared seasonal pattern closely - Groups with larger deviations (High School, Less than HS) show stronger education-specific seasonal effects - This decomposition reveals which education groups have distinct seasonal hiring/unemployment cycles beyond the overall pattern
Statistical Findings
Education Level Differences
=== UNEMPLOYMENT RATE HIERARCHY (June 2012) ===
1. professional: 2.35% (95% CI: 2.10% - 2.60%)
2. phd: 2.43% (95% CI: 2.18% - 2.67%)
3. masters: 3.75% (95% CI: 3.53% - 3.96%)
4. bachelors: 4.57% (95% CI: 4.29% - 4.84%)
5. some_college: 8.27% (95% CI: 7.85% - 8.69%)
6. high_school: 9.19% (95% CI: 8.81% - 9.56%)
7. less_than_hs: 10.75% (95% CI: 9.03% - 12.46%)
PhD vs High School: 6.76% lower (278.6% relative)
PhD vs Less than HS: 8.32% lower (343.0% relative)
Dispersion and Model Fit
=== QUASI-BINOMIAL DIAGNOSTICS ===
Dispersion parameter: 1.86
Deviance explained: 98.4 %
- Dispersion >> 1 indicates OVERDISPERSION
- Our data shows 1.86 × dispersion
- Quasi-binomial is ESSENTIAL (binomial SEs would be 1.4 × too small)
- Deviance explained indicates 98.4 % of variation captured
Model Component Decomposition: Marginal Effects
This section visualizes each component of the GAM separately using marginal effects plots. These plots isolate the contribution of each factor (education, shocks, seasonality) while holding others at reference levels.
Marginal Effects Panel
TableGrob (2 x 2) "arrange": 4 grobs
z cells name grob
1 1 (1-1,1-1) arrange gtable[layout]
2 2 (1-1,2-2) arrange gtable[layout]
3 3 (2-2,1-1) arrange gtable[layout]
4 4 (2-2,2-2) arrange gtable[layout]
Interpretation:
Education Main Effects (top left): Shows how each education level shifts unemployment risk relative to the reference level. PhDs have large negative effects (lower unemployment), while those with less than HS education have positive effects (higher unemployment).
2008-2009 Financial Crisis (top right): Captures the time-varying unemployment surge during the crisis. The shock smooth shows when the crisis hit hardest and how long the effects persisted. The wide confidence bands reflect uncertainty during turbulent labor markets.
2020 COVID-19 Pandemic (bottom left): Shows the sharp and rapid unemployment spike in early 2020 (the most dramatic period), followed by recovery through 2021. The pandemic’s shock was much faster than the gradual 2008 crisis.
Seasonal Effects (bottom right): Displays the typical annual pattern (peaking in winter months, declining through spring/summer). This pattern is consistent year-to-year across education levels.
Education Effect Estimates
=== EDUCATION MAIN EFFECTS (Log-odds Scale) ===
education effect se ci_lower ci_upper
1 bachelors 0.0000000 0.000000000 0.0000000 0.0000000
2 high_school 0.7631997 0.004765656 0.7538590 0.7725404
3 less_than_hs 0.9165178 0.030964180 0.8558280 0.9772076
4 masters -0.2167913 0.008060704 -0.2325902 -0.2009923
5 phd -0.6262797 0.019071548 -0.6636600 -0.5888995
6 professional -0.6615126 0.019932002 -0.7005793 -0.6224458
7 some_college 0.5679069 0.005350119 0.5574206 0.5783931
Note: Positive effects = higher unemployment risk
Negative effects = lower unemployment risk
Conclusions
PhD unemployment is genuinely lower than other education levels across the full 2000-2025 period, with a 1.7% average versus 3-5% for less educated groups.
Quasi-binomial models are critical: Standard binomial models would suggest 3-4× higher confidence than warranted. The large dispersion parameter (14.76) reflects natural variation in unemployment counts.
Education premiums are stable: The unemployment advantage of higher education persists through economic cycles, though all groups experience elevated unemployment during recessions.
Seasonal patterns are shared: All education levels show similar seasonal variation (peaking in winter, dipping in summer), reflecting common labor market dynamics.
Recent concerning trend: PhD unemployment has risen from 1.7% average to 2.6% in 2025, potentially reflecting:
- Tighter academic job markets
- Post-PhD visa/immigration changes
- Field-specific labor market shifts
- Post-pandemic labor market restructuring
Technical Notes
Model Estimation: REML with 500 max iterations Smoothing basis: Thin-plate regression splines for trends, cyclic cubic spline for seasonality Family: Quasi-binomial with automatic dispersion estimation Data: Current Population Survey monthly aggregates, 2000-2025 Statistical software: R 4.x with mgcv package
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] gridExtra_2.3 targets_1.11.4 dplyr_1.1.4
[4] tidyr_1.3.1 ggplot2_4.0.1 data.table_1.17.8
[7] mgcv_1.9-0 nlme_3.1-163 here_1.0.2
[10] phdunemployment_0.1.0
loaded via a namespace (and not attached):
[1] base64url_1.4 Matrix_1.6-1.1 gtable_0.3.6 jsonlite_2.0.0
[5] compiler_4.3.2 tidyselect_1.2.1 dichromat_2.0-0.1 callr_3.7.6
[9] splines_4.3.2 scales_1.4.0 yaml_2.3.12 fastmap_1.2.0
[13] lattice_0.21-9 R6_2.6.1 labeling_0.4.3 generics_0.1.4
[17] igraph_2.2.1 knitr_1.50 backports_1.5.0 htmlwidgets_1.6.4
[21] tibble_3.3.0 rprojroot_2.1.1 pillar_1.11.1 RColorBrewer_1.1-3
[25] rlang_1.1.6 utf8_1.2.6 xfun_0.55 S7_0.2.1
[29] cli_3.6.5 withr_3.0.2 magrittr_2.0.4 ps_1.9.1
[33] processx_3.8.6 digest_0.6.39 grid_4.3.2 secretbase_1.0.5
[37] lifecycle_1.0.4 prettyunits_1.2.0 vctrs_0.6.5 evaluate_1.0.5
[41] glue_1.8.0 farver_2.1.2 codetools_0.2-19 rmarkdown_2.30
[45] purrr_1.2.0 tools_4.3.2 pkgconfig_2.0.3 htmltools_0.5.9